More cool things with mutate

June 6, 2017

Using mutate can sometimes feel like…

harry+potter
harry+potter

But soon you will be a mutate expert and everything will be…

harry+potter
awesome.

library("dplyr")
library("purrr")

Mutate - itself

No more nested Ifs!

harry+potter
harry+potter
mutate(grade_group = ifelse(Grade >= 5 & Grade <= 8 & Subject == "ELA", "5-8", 
                            ifelse(Grade >= 5 & Grade <= 8 & Subject == "Math", "5-7", 
                                   ifelse(Grade < 5, "3-4", "Unknown")
                            )))

Correction 1 - multiple if statements, on their own:

mutate(grade_group = ifelse(Grade >= 5 & Grade <= 8 & Subject == "ELA", "5-8", NA),
       grade_group = ifelse(Grade >= 5 & Grade <= 8 & Subject == "Math", "5-7", grade_group),
       grade_group = ifelse(Grade < 5, "3-4", "grade_group"))

Correction 2 - translation table

grade_group <- tibble(grade = 3:8, 
                      ela_group = c("3-4", "3-4", rep("5-8", 4)), 
                      math_group = c("3-4", "3-4", rep("5-7", 4))
                      )
grade_group
## # A tibble: 6 x 3
##   grade ela_group math_group
##   <int>     <chr>      <chr>
## 1     3       3-4        3-4
## 2     4       3-4        3-4
## 3     5       5-8        5-7
## 4     6       5-8        5-7
## 5     7       5-8        5-7
## 6     8       5-8        5-7
results <- results %>% left_join(grade_group, by="grade")

COrrection 3 - case_when()

mutate(grade_group = case_when(
  Grade < 5 ~ "3-4",
  Grade <=8 & Subject == "ELA" ~ "5-8",
  Grade <= 8 & Subject == "Math" ~ "5-7",
  TRUE ~ NA
)

group_by and mutate

Example: So I have this file with student profile info, but each student has one record per school year…so ying has 9th grade, 10th grade, 11th grade. I want to create a file with one row per student, taking the latest year.

df %>%
  group_by(student_id) %>% 
  mutate(max_year = max(school_year) %>% 
  filter(school_year == max_year) %>% 
  ungroup()

Example: I want to rank schools by their sccores on a particular subject.

schools %>% 
  group_by(subject) %>%
  mutate(rank = min_rank(percent_correct))

Update Multiple Coulumns at Once

mutate_at/summarise_at:

  • Useful for when you need to apply the same function to many columns (i.e as.numeric())
  • It will automotically mutate all your columns at once.
  • You need to tell mutate which columns you want it to look at, and what to do with them.
  • You can give it multiple columns and multiple functions. All functions will act on all columns.
mutate_at(vars(), funs())

Example

results %>%
  mutate_at(vars(score, grade), as.numeric)

mutate_if/summarise_if:

inputs: test, function(s)

summarise_if(is.numeric, funs(mean, min, max, sd, n))
summarise_if(is.numeric, as.character)
summarise_if(is.numeric, as.factor)

Excersize

Task 1

Task 2